Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems

نویسندگان

چکیده

Regression problems have been widely studied in machine learning literature resulting a plethora of regression models and performance measures. However, there are few techniques specially dedicated to solve the problem how incorporate categorical features problems. Usually, feature encoders general enough cover both classification This lack specificity results underperforming models. In this paper, we provide an in-depth analysis tackle high cardinality with quantile. Our proposal outperforms state-of-the-art encoders, including traditional statistical mean target encoder, when considering Mean Absolute Error, especially presence long-tailed or skewed distributions. Besides, deal possible overfitting categories small support, our encoder benefits from additive smoothing. Finally, describe expand encoded values by creating set different quantiles. expanded provides more informative output about question, further boosting model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local Polynomial Quantile Regression With Parametric Features

We propose a new approach to conditional quantile function estimation that combines both parametric and nonparametric techniques. At each design point, a global, possibly incorrect, pilot parametric model is locally adjusted through a kernel smoothing fit. The resulting quantile regression estimator behaves like a parametric estimator when the latter is correct and converges to the nonparametri...

متن کامل

High-Dimensional Structured Quantile Regression

Quantile regression aims at modeling the conditional median and quantiles of a response variable given certain predictor variables. In this work we consider the problem of linear quantile regression in high dimensions where the number of predictor variables is much higher than the number of samples available for parameter estimation. We assume the true parameter to have some structure character...

متن کامل

Moment-Based Quantile Sketchesfor Efficient High Cardinality Aggregation Queries

Interactive analytics increasingly involves querying for quantiles over specific sub-populations and time windows of high cardinality datasets. Data processing engines such as Druid and Spark use mergeable summaries to estimate quantiles on these large datasets, but summary merge times are a bottleneck during high-cardinality aggregation. We show how a compact and efficiently mergeable quantile...

متن کامل

Quantile-based categorical statistics

Traditional point-to-point verification is more and more superseded by situation-based verification such as an object-oriented mode. One main reason is that difficulties are encountered while interpreting the outcome of a conventional contingency table based on amplitude thresholds. Firstly, a predetermined amplitude threshold splits the distributions under comparison at an unknown location. In...

متن کامل

EXTREMAL QUANTILE REGRESSION 3 quantile regression

Quantile regression is an important tool for estimation of conditional quantiles of a response Y given a vector of covariates X. It can be used to measure the effect of covariates not only in the center of a distribution, but also in the upper and lower tails. This paper develops a theory of quantile regression in the tails. Specifically , it obtains the large sample properties of extremal (ext...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-85529-1_14